An evolutionary analytical model of a complementary circular code simulating the protein coding genes, the 5' and 3' regions.

نویسندگان

  • D G Arquès
  • J P Fallot
  • C J Michel
چکیده

The self-complementary subset T0 = X0 [symbol: see text] ¿AAA, TTT¿ with X0 = ¿AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC¿ of 22 trinucleotides has a preferential occurrence in the frame 0 (reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. The subsets T1 = X1 [symbol: see text] ¿CCC¿ and T2 = X2 [symbol: see text] ¿GGG¿ of 21 trinucleotides have a preferential occurrence in the shifted frames 1 and 2 respectively (frame 0 shifted by one and two nucleotides respectively in the 5'-3' direction). T1 and T2 are complementary to each other. The subset T0 contains the subset X0 which has the rarity property (6 x 10(-8) to be a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in the frames 1 and 2 respectively. X0 is called a C3 code. A quantitative study of these three subsets T0, T1, T2 in the three frames 0, 1, 2 of protein genes, and the 5' and 3' regions of eukaryotes, shows that their occurrence frequencies are constant functions of the trinucleotide positions in the sequences. The frequencies of T0, T1, T2 in the frame 0 of protein genes are 49, 28.5 and 22.5% respectively. In contrast, the frequencies of T0, T1, T2 in the 5' and 3' regions of eukaryotes, are independent of the frame. Indeed, the frequency of T0 in the three frames of 5' (respectively 3') regions is equal to 35.5% (respectively 38%) and is greater than the frequencies T1 and T2, both equal to 32.25% (respectively 31%) in the three frames. Several frequency asymmetries unexpectedly observed (e.g. the frequency difference between T1 and T2 in the frame 0), are related to a new property of the subset T0 involving substitutions. An evolutionary analytical model at three parameters (p, q, t) based on an independent mixing of the 22 codons (trinucleotides in frame 0) of T0 with equiprobability (1/22) followed by t approximately 4 substitutions per codon according to the proportions p approximately 0.1, q approximately 0.1 and r = 1 - p - q approximately 0.8 in the three codon sites respectively, retrieves the frequencies of T0, T1, T2 observed in the three frames of protein genes and explains these asymmetries. Furthermore, the same model (0.1, 0.1, t) after t approximately 22 substitutions per codon, retrieves the statistical properties observed in the three frames of the 5' and 3' regions. The complex behaviour of these analytical curves is totally unexpected and a priori difficult to imagine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An evolutionary model of a complementary circular code.

The subset X0 = [sequence: see text] of 20 trinucleotides has a preferential occurrence in frame 0 (a reading frame established by the ATG start trinucleotide) of protein (coding) genes of both prokaryotes and eukaryotes. This subset X0++ has the rarity property (6 x 10(-8)) to be a complementary maximal circular code with two permutated maximal circular codes X1 and X2 in frames 1 and 2 respec...

متن کامل

Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the ...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

An analytical model of gene evolution with six mutation parameters: An application to archaeal circular codes

We develop here an analytical evolutionary model based on a trinucleotide mutation matrix 64 x 64 with six substitution parameters associated with the transitions and transversions in the three trinucleotide sites. It generalizes the previous models based on the nucleotide mutation matrices 4 x 4 and the trinucleotide mutation matrix 64 x 64 with three parameters. It determines at some time t t...

متن کامل

Circular RNA: features, functions and their correlation with diseases especially cancer

In early 2012, the world of science saw a fascinating discovery called circular RNA as a transcription product of thousands of genes in mice and humans. These circular RNAs have recently been grouped as the encoding RNA in an independent group that their remarkable difference with other RNAs is that these RNAs are not linear, in which two ends connect with a covalent connection creating a loop-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bio Systems

دوره 49 2  شماره 

صفحات  -

تاریخ انتشار 1998